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On the tightness of an SDP relaxation of /c-means 
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Abstract 

Recently, [5] introduced an SDP relaxation of the fc-means problem in M™. In this work, we 
consider a random model for the data points in which k balls of unit radius are deterministically 
distributed throughout K™, and then in each ball, n points are drawn according to a common 
rotationally invariant probability distribution. For any fixed ball configuration and probability 
distribution, we prove that the SDP relaxation of the fc-means problem exactly recovers these 
planted clusters with probability 1 — provided the distance between any two of the ball 

centers is > 2 -|- e, where e is an explicit function of the configuration of the ball centers, and 
can be arbitrarily small when m is large. 


1 Introduction 


Clustering is one central task in unsupervised machine learning. The problem consists of parti¬ 
tioning a given finite set P into k subsets C = {Ci, ... ,Ck} such that some dissimilarity function 
is minimized. Usually the similarity criterion is chosen ad hoc with an application in mind. A 
particularly common clustering criterion is the fc-means objective. Let P C M"* a finite set. For 
Ci G P let Ci be its centroid Q = |^ Ylx&Ci Then the fc-means problem is 


k 


min 

CiU...uCfc=p 

CinCj=0 


EE 

i=l x^Ci 



( 1 ) 


Problem Q is NP-hard in general [ 5 ]. A popular approach to solving this problem is the 
heuristic algorithm by Lloyd, also known as the fc-means algorithm [7]. This algorithm alternates 
between calculating centroids of proto-clusters and reassigning points according to the nearest 
centroid. Lloyd’s algorithm (and its variants [21 [ 10 ]) may, in general, converge to local minima 
of the fc-means objective (see for example section 5 of m) Furthermore, the output of Lloyd’s 
algorithm does not indicate how far it is from optimal. As such, a slower algorithm that emits such 
a certificate may be preferable. 

Along these lines, convex relaxations provide a framework to attack NP-hard combinatorial 
problems. This framework is known as the “relax and round” paradigm. Given an optimization 
problem, first relax the feasibility region to a convex set, optimize subject to this larger set, and then 
round this optimal solution to a point in the original feasibility region. One may seek approximation 
guarantees in this framework by relating the value of the rounded solution to the value of the optimal 
solution. Convex relaxations of clustering problems have been studied mm, and a particular 
relaxation of fc-means is known to satisfy an approximation ratio |6|. 

Sometimes, the rounding step of the approximation algorithm is unnecessary because the convex 
relaxation happens to find a solution that is feasible in the original problem. This phenomenon is 
known as exact recovery, tightness, or integrality of the convex relaxation. Note that when exact 
recovery occurs, the algorithm not only provides a solution, but also a certificate of its optimality. 
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Method 

Sufficient Condition 

Optimal? 

Reference 

Thresholding 

A > 4 

Yes 

(simple exercise) 

fc-medians LP 

A > 4 

No 

Theorem 2 in [4] 


A > 3.75 

No 

Theorem 1 in [9] 


A > 2 

Yes 

Theorem 1 in [3] 

fc-means LP 

A > 4 

Yes 

Theorem 9 in [3] 

fc-means SDP 

A > 2\/2(l + 1/y/Pi) 

No 

Theorem 3 in [3] 


A > 2 

Yes* 

Conjecture 4 in [3| 


A > 2 + Cond( 7 )/m 

No* 

Theorem 

1 


Table 1: Summary of cluster recovery guarantees under the stochastic ball model. The second 
column reports sufficient separation between ball centers in order for the corresponding method 
to provably give exact recovery with high probability. (*) We report whether these bounds are 
optimal under the assumption of Conjecture 4 in [3]. 


thanks to convex duality. This paper focuses on exact recovery under a particular convex relaxation 
of the fe-means problem. 

1.1 Integrality of convex relaxations of geometric clnstering 

When is a convex relaxation of geometric clustering tight? This question seems to have hrst 
appeared in [1], where the authors study an LP relaxation of the /c-median objective (a problem 
which is similar to fe-means). That hrst paper proves tightness of the relaxation provided the set 
of points P admits a partition into k clusters of equal size, and the separation distance between 
any two clusters is sufficiently large. Later on, [9] studied integrality of another LP relaxation to 
the fe-median objective. This paper introduced a distribution on the input P, which we refer to as 
the stochastic ball model: 

Definition 1 ((P,7, n)-stochastic ball model). Let {7a}a=i be ball centers in M™'. For each a, 
draw lid vectors {ra,i}'l^i from some rotation-invariant distribution V supported on the unit ball. 
The points from cluster a are then taken to be Xa^i '■= ra,i + 7a- 

Table summarizes the state of the art for recovery guarantees under the stochastic ball model. 
In [9], it was shown that the LP relaxation of /c-medians will, with high probability, recover clusters 
drawn from the stochastic ball model provided the smallest distance between ball centers is A > 
3.75. Note that exact recovery only makes sense for A > 2 (i.e., when the balls are disjoint). Once 
A > 4, any two points within a particular cluster are closer to each other than any two points from 
different clusters, and so in this regime, cluster recovery follows from a simple thresholding. 

For the fc-means problem, [3] provides an SDP relaxation and demonstrates exact recovery in 
the regime A > 2\/2(l + 1/y/m), where m is the dimension of the Euclidean space. That work also 
conjectures that the result holds for optimal separation A > 2. The present work demonstrates 
tightness given near-optimal separation: 

Theorem 2 (Main result). The k-means SDP relaxation ® from recovers the planted clusters 
in the {'D,j,n)-stochastic ball model with probability 1 — provided 

k‘2 

A > 2 -I-Cond( 7 ), 

m 
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distance distance 

Figure 1: Frequency of successful certification for the SDP relaxation of k-means (left: our 

certificate, right: certihcate from El). Lighter color represents higher probability of success. Area 
to the right of the vertical line corresponds to the regime where exact recovery was proven with 
high probability. We generate 30 random instances of our (P, 7 , n)-stochastic ball model for each 
given distance and number of points. Here we take T> as the uniform distribution in the unit ball 
in M®. 


where 


Cond( 7 ) := 


m<i,Xa,b£{l,...,k},a^b IITq 
^^^a,bG{l,...,k},af^b IIT^ 


lb\? 

ib¥ 


Our proof of Theorem follows the strategy of [3] , namely, to identify a dual certihcate of the 
SDP, and then show that this certihcate exists for a suitable regime of A’s under the stochastic 
ball model. Figure provides numerical simulations that illustrate the empirical performance of 
our dual certihcate in comparison with the one provided in [3]. 

This paper is organized as follows. Section formulates a semidehnite relaxation of A;-means 
and derives its dual. Section provides deterministic conditions that guarantee the solution of the 
relaxation provided is feasible for fe-means. Section proves Theorem showing the deterministic 
conditions are satished with high probability under the ( 2 ?, 7 , n)-stochastic ball model. 


3 














2 Background 


Given P C with |P| = N, we seek to solve the /c-means problem Q which is well-known to be 
equivalent to 

^ ^ 1 

minimize (2) 

t=l ' Xi,Xj^At 

subject to AiU ■ ■ ■ \J Ak = P 

This problem is NP-hard in general [I]. However, many instances of this problem can be solved by 
relaxing to the following SDP: 


maximize — Tv{DX) (3) 

subject to Tr(X) = k 
XI = 1 
X > 0 
X ^ 0 


Here, D denotes the matrix whose {i,j)th. entry is ||xj—XjUl- Observe that is indeed a relaxation 
of (§: Let 1a denote the indicator function of H C {1,..., A^}. Then taking X := YA=i 
gives that 

Tr(D.Y) = Tr ^ ^ W MDUA.) = g 

t=l Xi,Xj£At 

Also, X is clearly feasible in ([^, and so we conclude that the SDP is a relaxation of the A;-means 
problem 0. 

To derive the dual of (|^, we will leverage the general setting from cone programming [8], 
namely, that given closed convex cones K and L, the dual of 


is given by 


maximize (c, x) 
subject to b — Ax G L 
X G K 


minimize (6, y) 
subject to A*y — c G K* 
y^L* 


(4) 


(5) 


where A* denotes the adjoint of A, while K* and L* denote the dual cones of K and L, respectively. 
In our case, c = —D, x = X, and K is simply the cone of positive semidefinite matrices (as is K*). 
Before we determine L, we need to interpret the remaining constraints in Q. To this end, we note 
that Tr(X) = k is equivalent to {X,I) = /c, XI = 1 is equivalent to having 

^X,^(eir + le7)^ = 1 ViE{l,...,X}, 
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and X > 0 is equivalent to having 




\/i,j e i<j. 


(These last two equivalences exploit the fact that X is symmetric.) As such, we can express the 
remaining constraints in ® using a linear operator A that sends any matrix X to its inner products 
with I, {|(ejl"'~ + le7)}^i, and The remaining constraints in Q are 

equivalent to having b — Ax G L, where 6 = fc © 1 © 0 and L = 0 © 0 © Writing 

y = z (B a ® (—/3), the dual of ([^ is then given by 

N 

minimize kz + a, (6) 

i=l 

TV ^ N N ^ 

subject to zl + E ai • -EE Pij ■ ^{eiej + ejeJ)+D © 0 

2=1 2=1 j=i 

/? > 0 

Theorem 3 (e.g., see IE]). Suppose the primal program S and dual program Q are feasible and 
bounded. 


(a) 


Strong duality. The primal program Q has optimal value val if and only if the dual program 
has bounded optimal value val. 


(b) 


Complementary slackness. The decision variables x and y are optimal in Q and Q, 
respectively, if and only if 

{A*y - c,x) = 0 = {y,b - Ax). 


For notational simplicity, we organize indices according to clusters. For example, from this 
point forward, la denotes the indicator function of the ath cluster. Also, we shuffle the rows and 
columns of X and D into blocks that correspond to clusters; for example, the {i,j)th entry of the 
(a, 6)th block of D is given by We also index a in terms of clusters; for example, the ith 

entry of the ath block of a is denoted Oa^i. For /3, we identify 

N N ^ 

/3 ^ ^ • -(ciej + e^el). 

i=l j=i 

Indeed, when i < j, the (i,j)th entry of f3 is flij. From this point forward, we consider /3 as having 
its rows and columns shuffled according to clusters, so that the (i, j)th entry of the (a, 6)th block 
IS/3b b 

Theorem 4. Take X := X]a=i ’ where Ua denotes the number of points in cluster t. The 

following are equivalent: 

(a) X is a solution to the SDP relaxation Q. 

(b) Every solution to the dual SDP ([^ satisfies 

Q(“’“h = 0, =0 Va G {!,...,yt}, 


where Q := A*y — c. 
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(c) Every solution to the dual SDP (© satisfies 


a. 




n„ nt. 


n„ 


“■a "'a 

Proof. (a)44>(b): By complementary slackness, (a) is equivalent to having both 

{A*y-c,X)=0 (7) 

and 

{y,b-A{X))=0. (8) 

Since Q ^ 0, we have 

k k 

{A*y -c,X) = {Q, X) = (q,T-I tlJ QU > 0, 

\ fi^ / ^ ri-i 


nt 


' t=i ' t=i 

with equality if and only if Qla = 0 for every a G {1,..., A:}. Next, we recall that y = z®a® (—/?), 

p7V(iV+l)/2 


b — A{X) £ L = and b = A:©1©0. As such, ([^ is equivalent to fi having disjoint 

support with {(Al, ^(cjej + ejej i.e., = 0 for every cluster a. 

(b)=>(c): Take any solution to the dual SDP Q, and note that 

=zI+{Y,Y. + letT*)) ^ ^ 

t=i iet ' 


zl + ota,i ■ —{ei^J + le7) + D^°" 


a) 


iGa 


where the 1 vectors in the second line are na-dimensional (instead of A^-dimensional, as in the first 
line), and similarly for e* (instead of et^i). We now consider each entry of which is zero by 

assumption: 

0 = ejQ(“’“h 

= ej (^I + Ua^i ■ -(ejl"*" + le7) + 1 

= Z p ^ ^ Q!a,i • 2 1) T 


2Ga 


(y.a,i • —{nadir + 1) + cj 


(9) 


i£a 


As one might expect, these linear equations determine the variables {aa,i}i£a- To solve this 
system, we first observe 

0 = 

= l"*" aa,i ■ -(ejl"'' + 1^7) + 1 

+ 1 ) + 

\a 


= UnZ 


= naZ 


i^a 
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and so rearranging gives 


^ na 

We use this identity to continue Q: 

0 = z + aa^i ■ - {naSir + 1) + eJ1 


i£a 

n. 


= z 




2Ea 


= 2 + '^Oia,r + l( -Z- — 1Td(“.“)i^ + 

2 ’ 2 V Ua ) 


and rearranging yields the desired formula for Oa.r- 

(c)=i>(a): Take any solution to the dual SDP Q. Then by assumption, the dual objective at 
this point is given by 




t=l i£t 


t=l i£t 


= - V —rT)(hbi 
= -Tv{DX), 


i.e., the primal objective Q evaluated at X. Since X is feasible in the primal SDP, we conclude 
that X is optimal by strong duality. □ 


3 Finding a dual certificate 

The goal is to certify when the SDP-optimal solution is integral. In this event, Theorem char¬ 
acterizes acceptable dual certificates {z,a,/3), but this information fails to uniquely determine a 
certificate. In this section, we will motivate the application of additional constraints on dual cer¬ 
tificates so as to identify certifiable instances. 

We start by reviewing the characterization of dual certificates {z,a,(3) provided in Theorem]^ 
In particular, a. is completely determined by z, and so ^ and /3 are the only remaining free variables. 
Indeed, for every a, 6 G {1,..., fc}, we have 

/ fc .. \ (a,b) 

( Z] Z “hi • 2(®hil^ + J 

^ t=i i& h 
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and so since 


Q — zl + ^ ^ at^i ■ + le^j) — -(3 + D, 

t=l iGt 

we may write Q = z{I — E) + M — B, where 



J^{a,b) _ lo(a,b) 

2 


for every a,b € {1,... ,k}. The following is one way to formulate our task: Given D and a clustering 
(which in turn determines E and M), determine whether there exist feasible z and B such that 
Q ^ 0; here, feasibility only requires B to be symmetric with nonnegative entries and = 0 

for every a G {1,..., A:}. We opt for a slightly more modest goal: Find z = z{D) and B = B{D) 
such that Q ^ 0 for a large family of D's. 

Before determining z and B, we first analyze E: 


Lemma 5. Let E be the matrix defined by (10). Then rank(£') G {1,2}. The eigenvalue of largest 
magnitude is X > k, and when rank(iii) = 2, the other nonzero eigenvalue of E is negative. The 
eigenvectors corresponding to nonzero eigenvalues lie in the span of {la}a=i- 


Proof. Writing 


k k 


we see that rank(i?) G {1, 2}, and it is easy to calculate El = Nk and Tr(ii') = k. Observe that 

1 


A = sup Ex > ^1 ' El = k, 

||x|| = l 


iT; 


and combining with rank(i?) < 2 and Tr(£') = k then implies that the other nonzero eigenvalue (if 
there is one) is negative. Finally, any eigenvector of E with a nonzero eigenvalue necessarily lies in 
the column space of E, which is a subspace of spanjlaj^^^^ by the dehnition of E. □ 

When finding z and B such that Q = z{I — E) + M — B y 0 it will be useful that I — E has 
only one negative eigenvalue to correct. Let v denote the corresponding eigenvector. Then we will 
pick B so that v is also an eigenvector of M — B. Since we want Q ^ 0 for as many instances 
of D as possible, we will then pick z as large as possible, thereby sending v to the nullspace of 
Q. Unfortunately, the authors found that this constraint fails to uniquely determine B in general. 
Instead, we impose a stronger constraint: 


Qla = 0 Va G {1,..., /cj. 



(This constraint implies Qv = 0 by Lemma [^) To see the implications of this constraint, note that 
we already necessarily have 

{Qla)a = ((z(/ -E)+M- B)la^ = z{I - £;(“’“))! + = z(^l- = 0, 

and so it remains to impose 

0 = {QWa = [{z{I -E) + M- B)h)^ 

= -zE^^’^h + + - B^^’^h. 

2na 

In order for there to exist a vector > 0 that satisfies (12), z must satisfy 


( 12 ) 


rig + ni, 

2n„, 


< min(M(“’'’h), 


and since 2 ; is independent of (a, 6), we conclude that 


277 

z< min - - —min(M^“’^H). 

a,be{l,...,k} Ug + rib 
a^b 


(13) 


Again, in order to ensure z{I — E) + M — B^O for as many instances of D as possible, we intend 
to choose 2 as large as possible. Luckily, there is a choice of B which satisfies (12) for every (a, b), 


even when z satisfies equality in (|13|). Indeed, we define 

^(a,b) 


r{b,a) 


b)'^Jb,a) 


(14) 


for every a,b G {1,..., A:} with a ^ b. Then by design, B immediately satisfies (12). Also, note 
that P(gfi) = P{b,a)i s-iid so meaning B is symmetric. Finally, we necessarily 

have ri(a,b) > 0 (and thus /0(a,b) > 0) by ( [T^ , and we implicitly require P(a,fe) > 0 for division to be 
permissible. As such, we also have > 0, as desired. 

Now that we have selected z and B, it remains to check that Q ^ 0. By construction, we 
already have A := spanjla}^^^ in the nullspace of Q, and so it suffices to ensure 

0 ^ PazQPaz — PA^ (^z{I — E) + M — B^ Paz — zPa-t + Pa^ (AI — B)Pj^± , 
which in turn is implied by 

||P^±(A/ — B)P^±\\2^2 < 2^- 
To summarize, we have the following result: 

Theorem 6. Take X := Ylt=i where rit denotes the number of points in cluster t. Consider 

M and B defined by © and ( [l4)), respectively, and let A denote the span of Then X is 

a solution to the SDP relaxation (1^ if 


\P^z{M - B)P^v\\ 2^.2 < niin 


2n„ 


a,fee{l,...,fe} Ug + Ub 

a^b 




(15) 


A sufficient condition that implies Theorem can be obtained by finding an upper bound on 
the left-hand side of (15). This is Corollary]^ which we use to prove the main theorem. 
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Corollary 7. Take X := where rit denotes the number of points in cluster t. Let 'I' 

denote the m x N matrix whose {a,i)th column is Xa,i — Ca, where 


Cn, .— 


rir 




iGa 


denotes the empirical center of cluster a. Consider M and P(a,b) defined by © and ( [H] ), respec¬ 
tively. Then X is a solution to the SDP relaxation (© if 


211^' 


,9 ^ ^ ||Pi±M(“’^)1||2||Pi±M(Mi||2 

2^2 + > > ^^^ < 


mm - 

a,bG{l,...M Ha + Ub 


mm 




a^b 


Proof. First, the triangle inequality gives 

\\Pj^±{M — B)P/t^±\\2^2 < \\PaxM Pjy±\\2^2 + 11-Pa-L-S-Pax 112^.2- (16) 

We will bound the terms in ( |16[ ) separately and then combine the bounds to derive a sufficient 
condition for Theorem]^ To bound the first term in (16), let u be the x 1 vector whose (a, i)th 
entry is ||xa,i|P, and let be the mx N matrix whose (a,i)th column is Xa,i. Then 


P(a,i),{b,j) — 11^“,* — 11^®,* I' 


- + \\Xb,jf = - 2$"^$ + ll^"^)(a,i),(fe,i), 

meaning D = — 2<1>^<1> + . With this, we appeal to the blockwise definition of M 

llP^xMP^r II 2 -S .2 = ||PaxPPax ||2^-2 = ||Pax(z^1''' — 2$'''<1> + lu^)P^±\\2^2 


= 2||Pax$^^>Pax||2^2 = 2||^>Pax||L2 = 2||^||L2- 


For the second term in (16), we hrst write the decomposition 

^ = E E (P(a,6)(P(“’'^)+%,a)(^^'’“^ 

Cl—1 6 =q.-|-1 


where P(a,b) • —)■ produces a matrix whose (a, 5)th block is the input matrix, and is 

otherwise zero. Then 

k k 

PaxPP^x=J; PAx(P(,,6)(p(“P)+/?(fe,,)(p('’“)))PAX 

a=l b=a-\-l 

k k 

= E E (P(a,fe)(PlxP(“’')Pix)+i/(b,,)(PixP('’“)Plx)), 

a=l b=a+l 

and so the triangle inequality gives 

k k 

\\P^j_BPj^±\\2^2 < E E l|P(a,fe)(PlxP(“’')Plx)+i/(,,,)(PixP(''“)Pix)||2^2 

ci=l 6=ci-l-l 

= E E iiPffiP^“’'^Pffi 112 ^ 2 , 

a=l b=a-\-l 
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where the last equality can be verified by considering the spectrum of the square: 

2 


At this point, we use the definition of B @ to get 




2-)-2 — 


I ^{a,b) II2IIPl^ '^(b,a) II2 

P{a,b) 


Recalling the definition of U(^a,b) (14) and combining these estimates then produces the result. □ 


4 Proof of main result 

In this section, we apply the certificate from Corollary to the (P, 7 , n)-stochastic ball model (see 
Definition!^ to prove our main result. We will prove Theoremwith the help of several lemmas. 

Lemma 8. Denote 

1 ” -I- 

Ca '■= -'y\xa,i, Aafe := ||7a - Tfell) Oafe := ^ 


i=l 


Then the {D,'y,n)-stochastic ball model satisfies the following estimates: 



||Ca - 7a|| < e 

w.p. 


(17) 


^^\\ra£ -E\\rf 
2=1 

< e 

w.p. 

1 _ g-^6(r2) 

(18) 

Oab\ 

2 - E||r + 7a - Oabf 

< e 

w.p. 

1 _ 

(19) 


2 = 1 


Proof. Since Er = 0 and ||r|p < 1 almost surely, one may lift 


Xa,^ : = 


0 r~*~ - 

a,I 

X a^i 0 


and apply the Matrix Hoeffding ineqnality m to conclude that 


Pr 


^ra,i 


2=1 


>t] < 


Taking t := en then gives (©• For ( |18[ ) and ( |19[ ), notice that the random variables in each 
sum are iid and confined to an interval almost surely, and so the result follows from Hoeffding’s 
inequality. □ 

Lemma 9. Under the {D,'y,n)-stochastic ball model, we have = 4np + q, where 

Pi ■= - Oab) + ^ 

Qi •— Oab) f (Ca Cfo) (7a Ifb) ) 4“ ( ^ ^ ll^&J OaftH ^ ^ ll®a,i OabU j 

and \qi\ < (6 + Aab)n€ with probability 1 — _ 
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Proof. Add and subtract Oab and then expand the squares to get 


ej ^ \\xa,i - Xbjf - ^ ||xa,i 


i - X 


a.,J I 


i=i 


= n - 


i=i 


^ - 2{Xa,i - OabViCb - Oab) + ^ ^ W^bJ “ 

V ^ j=l / 

2{Xa,i Oab) (Ca Oab) H ^ ^ ll^aj Oafe|| 

V ™ J=1 / 

( n n \ 

'y ^ Oafell 'y ^ ll^^aj Oaftll j . 

.7 = 1 .7 = 1 ^ 


- TT - 


■J=l J= 

Add and subtract 7 a — 77 , to Ca — cj, and distribute over the resulting sum to obtain 

eJ 1 - !)(“’“) 1) = 2n{xa,^ - Oab)'^ {la -lb)+q 

= ^n(^ra,i + {la - Oab)^ {la - Oab) + Q- 


Distributing and identifying || 7 a — Oab|p = A^^/4 explains the definition of p. To show \qi\ < 
(6 + Aab)ne, apply triangle and Cauchy-Schwarz inequalities to obtain 


\Qi\ < 


( \ ft ft 

{Ca Cfe) {la lb) j T ^ ^ Oabll ^ ^ f^abll 

^ i=i i=i 

— 2 ?T. f ||ra^i|| T 117 a Oa,fe|| j f ||Ca Tall T ||cft Tfcll j T ^ ^ ll^bj OafeU ^ ^ ll^aj Oab|| 

^ ^ .7 = 1 .7 = 1 


A, 


< 2n( 1 + ^ ) ( ||Ca -7a|| + I|C6 -Tfell ) + 


'y ^ ll^fej Oafell 'y ^ ll^aj Oab 11 
i=i j=i 


To finish the argument, apply to the first term while adding and subtracting 

Elk + 7a - Oabf = E||r + 76 - Oabf, 


from the second and apply (19). 

Lemma 10. Under the {'D,i,n)-stochastic ball model, we have 
1 


□ 


n 


-lTD{“’a)i _ 2nE||r|| 


< 4?re w.p. 1 — e ^ 


Proof. Add and subtract la and expand the square to get 


-e7D(“’Oi = 

n n ^ 


|2 II ||2 

“ I’a,* 


i=i 


2l'7i(Ca - 7a) + - 


j=l 
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The triangle and Cauchy-Schwarz inequalities then give 
-1 Td(“’“)i - 2nE||rf 


n 


E 

i=l 


2rJ^i(Ca - 7a) + - 


' I 


i=i 


— 2?T,E||r| 


< n 


< n 


- ft fL - fL 

\\ra,if - E||rf + 2 ^ \rli{ca -la)\+n 


n ^^ ^ ’ n 

i=l i=l j=l 


n n 1 ^ 

Y lka,i|p - IE||r|p + 2 X] - 7a|| + n - Y 


a,] I 


— E||r| 


i=l 


i=l 




< 4ne, 


where the last step occurs with probability 1 — e by a union bound over (18) and ( |17[ ). □ 

Lemma 11. Under the {'D,'y,n)-stochastic ball model, we have 

lT^(a,6)^ _ ^T^(a,a)^ > ^2^2^ _ 1 _ _ 

Proof. Lemma 1^ gives 

lT^(a,b)^ _ ;^T^(a,a)^ ^ {Anp + q) 


> 4n ^ (rYla - Oab) + ||7a - Oabf'\ “ (6 + AabWe 
i=l ^ ' 


> Ani^^^n{Ca - 'JaVi'la “ Oab) H-j “ (6 + Aab)n'^e. 

Cauchy-Schwarz along with (© then gives the result. 

Lemma 12. Under the {'D,'y,n)-stochastic ball model, there exists C = C{'y) such that 
min min(M^“’^^l) > nA(A — 2) + Cne w.p. i — ^ 

a^b 

where A := min Anb- 

a^b 

Proof. Fix a and b. Then by Lemmathe following holds with probability 1 — ■ 

A2 


□ 


mm 


in (Zl(“’^)l - Zl(“-“)l) > 4n .^min^^ (rY^a - Oab) + - (6 + Aab)ne 

> nAlh - 2nAab - (6 + Aab)ne, 

where the last step is by Cauchy-Schwarz. Taking a union bound with Lemma 10 then gives 
min(M(“’'’h) 

+ 


= mm 


> min _ i iiT^(a,a)^ _ 2nE\\rf 

> nAab{Aab - 2) - (10 + Aab)ne 


ilT^(fe,6)l _ 2nE\\rf 
n 
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with probability 1 — e The result then follows from a union bound over a and b. □ 

Lemma 13. Suppose e < 1. Then there exists C = C{Aab,m) such that under the {'D,'y,n)- 
stochastic ball model, we have 

A 2 

IIP ^^{a,b )^||2 < ^- ab p 

m 

with probability 1 — . 

Proof. First, a quick calculation reveals 

Tp^(a,b)i ^ Tp(a,b)^ _ Tp(a,a)^p Vl^Tp(a,a)^ _ ^iT jjib,b) A 

2\n n J 

IfTp{a,b)^ — “ f “1^+ -f'’’ , 
n n 2 \n n J ’ 

from which it follows that 

n 


As such, we have 

= - d(“’“)i||2 _ ||Pi(d(“’^)i - d(“’“)i)||2. (20) 

To bound the first term, we apply the triangle inequality over Lemma 

||p(a,6)^ - < 4n||p|| + \\q\\ < 4n\\p\\ + (6 + Aab)n^/h. (21) 


We proceed by bounding ||p||. To this end, note that the pfs are iid random variables whose out¬ 
comes lie in a finite interval (of width determined by Aab) with probability 1. As such, Hoeffding’s 
inequality gives 


1 

n 


'^p^ -Epl 

i=l 


< e 


w.p. 


1 _ 


With this, we then have 


IIpIP = n(— "^^Pi — Epi + < nEpi -|- ne 

^ 2=1 '' 


( 22 ) 


in the same event. To determine Ep‘1, first take ri := ejr. Then since the distribution of r is 
rotation invariant, we may write 

Pi = ^7l(7a - Oab) + ha “ Oabf = -k 
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where the second equality above is equality in distribution. We then have 

= Ef ^ 

\ 2 4 J 4 ^16 

We also note that 1 > IE||r||2 = mErf by linearity of expectation, and so 


Ert < —. 
m 


Combining (21), (22), (23) and (24) then gives 


To bound the second term of 


/4n^ 
m 

first note that 


\ 1/2 

+ + 16n^e j + (6 + Aab)n^^‘^e. 


iPUD^'^'^h - ^ lTE)(a,6)l _ lT^{a,a)^ 

Jn 


Lemma 11 then gives 

lT^(a,6)^ _ ;^T^(a,a)^ > jj{a,a) ^ > ^2^2^ - (6 + ^Aab)r?e 


(23) 


(24) 


(25) 


(26) 


(27) 


with probability 1 — e Using (20) to combine (25) with (26) and (27) then gives the 

result. □ 

Lemma 14. There exists C = C{'y) such that under the (T>, ^, n)-stochastic ball model, we have 

P(a,b) > ^n'^Al^, - Cn'^e w.p. 1 - 


Proof. Recall from (14) that 

p(a,h) = ii(a,6)l = -nz = - n min min(M(“’^h). 

a^b 


(28) 


To bound the first term, we leverage Lemma 11 


^T^(a,b)^ = iT _ l(iT jjMi + iT e)(L^)i) 

> n'^Alf, - (6 + 3Aab)n‘^e 

with probability 1 — To bound the second term in (28), note from Lemma 

min(M(“’^h) 


10 


that 


= mm 

< min 

< min 


(1 ( liT— ll "''jjibpi 

V J 2\n n , 

(^D(“’'’)i - d(“’“)i^ + 1 ^ _ 2nE||rf + - 2nE||rf ^ 
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with probability 1 — e Next, Lemmagives 


mm 


in < nAlfj + (6 + Aab)ne + 4n min - Oab)- 


By assumption, we know ||r|| > 1 — e with positive probability regardless of e > 0. It then follows 
that 

r^ila - Oab) < ^ 

with some (e-dependent) positive probability. As such, we may conclude that 

min rl,{^a - Oab) <~ + e w.p. 1 - 

ie{i,...,n} 2 

Combining these estimates then gives 

min(M*'“’^h) < — 2nAab + (14 -|- Aab)ne w.p. 1 — _ 

Performing a union bound over a and b then gives 

min(M(“’^h) < nA^ - 2nA-k (14A)ne w.p. i _ 


mm 

a^b 


Combining these estimates then gives the result. 

Lemma 15. Under the {'D,^,n)-stochastic ball model, we have 




2^2 < 


\ y/m ) 


where a"^ := IE||r|p for r ^ T>. 

Proof. Let R denote the matrix whose (a, i)th column is ra,i. Then 


T = i?- 

and so the triangle inequality gives 

T 


(ci - 71 ) 1 "^ • • • (cfe - 7 fc)l 


|'I'|| 2 -s .2 < II.RII 2 -S .2 + (ci— 7 l)l ••• (Cfc — 7 fc)l 


|T 


< 


2->-2 


□ 


\\R\\2^2+ (n^||Ca-7a||' 


a=l 


1/2 


where the last estimate passes to the Frobenius norm. For the first term, since R is rotation 
invariant, we may apply Theorem 5.41 in 


||.R||2-s>2 a (1 + e)c 


w.p. 


_ g ^m,cr,e(^) 


For the second term, apply (|17|). The union bound then gives the result. 


□ 
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Proof of Theorem^ First, we combine Lemmas 13, 14 and 15 For each e > 0, we have 


2ii'h|iL2+E E 

a=l b=a-\-l 


|PixM(“’^)i||2||Pi±M(^’“)i||2 

P(a,b) 




m + CnPt 


a=l b=a-\-l 


2n^A — Cvfe 


with probability 1 — e Furthermore, for every <5 > 0, there exists an e > 0 such that 


1 + e ^ ^ An^A‘^,/m + Cn^e 

+ e j nk + ^ ^ - < n 


m 


a=l 6=a+l 


2n2A — Cn^e 


fc(fc-l)ACond( 7 ) \ ^ 
\m m J 


Considering Lemma 12, it then suffices to have 

2k k{k-l)ACond{-f) 


-h 

m 


m 


< A(A- 2). 


Rearranging then gives 


^ ^ 2 + — + - 1) Cond( 7 ) 

mA m 

which is implied by the hypothesis since A > 2 and Cond( 7 ) > 1. 


□ 
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